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How Rings assign address space 

Stepl: ali^ircanang address to self (to some paver of 2) 
Step2: assign the re silt to self address 

Step3: next_adcfr = selfaddr + self_addr_space- // nunber of regster usedlocally 
Step4: send downnext_addr 

Dim needs 16 aMss 

Uartreeds4 self =32 
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clkl 



-M>- 



clk2 



If the delay between "clkl" and "clk2" 
greater then the delay from Q to d of 
second flipflop, we have a race on our 
meaning right hand flipflop will 
sample the data of Q a whole clock period 
early. 



7 



-fd L2. (log? 



A- 



6*f compound A compound B 



clock runs with data 

the problem is possible race. 
However, we control the logic on 
each flipflop leaving the compound, 
because it is always the same standart 
ring-interface module, we can ensure, 
that the delay will be at least enough. 
And more importantly easily checked 
after layout. 
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7 



7* 



- d Q?(j°gjO- 



V^lka 



compound j 



data_a 



compound b 



clock opp oses data 

this arrangement has the advantage 
of auto ensuring the no race 
••condition (at least in this simple 



data_b case ) exists 



1* 



clkb- 
clka 



Qa - 
data_a 



data_a which changes after clkb, 
which is later then clka, is sampled 
by clka. NO RACE. 



*3 
7 



- d 



clka 



r 



X 90 



~\ 

1? 



OK ^ 



-72- 




compound B 



compound A 
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Qlfd 



Ob 



latch 4^ 



clkb 



v 

-7? 



Qb 



ff 



clka 



clock 



clock 




clka 



clkb 



data_a 




clock 



certainty range 



data_a leaving the bridge goes to member *'b" and 
there should be sampled by rising of clkb. clkb 
lags a lot behind clka of the bridge. As clearly seen 
from the waveforms, race is eminent. Here we 
should add latches for all the data lines (-90). 
Adding latch works however if the delay between 
clka and clkb is less then 75% of cycle time, 
otherwise the uncertainty kills the usable time. It 
sets hard limit on the number of ring members. 
Also keep in mind that latches needed on each OK 
signal between members of the ring 
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clka 



data 




clock 




ncertainty range 



Here, data_b leaves member "b" to be sampled by 
clka in the bridge. But now clkb lags a lot behind 
clka. This actually works to our advantage, If the 
lag is smaller then better part of clock cycle. This 
solution looks better, because between adjacent 
members, we can take care to delay the datas 
beyond danger zone of clock delay, the OK signals 
are covered automatically, and last leg data is also 
covered. The only signal not safe is the OK from 
bridge to "b" member. It will need a latch in "b". 



big module 



F'9 



data_in 



local_clock "~ 

local_data_out 



lit 



data from 



previous member 



elk 



A 



11% 



2 



ring interface 



-^►data 



clock 



local clock lags behind 
ring_interface clock of this 
module, because we presume the 
module is big. for data_coming 
out, it is not a problem, it changes 
later then ring-i/f flipflops clock. 
However for data entering the 
module from previous member, 
the race is a possibility we must 
look into. 
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if module "a" sends a message to module *V\ ring works 
fine. However if most of the traffic is from "c" to "b'\ 
this is more expensive in terms of latency. 



Another problem is "peak latency". Suppose that , "a" 
transmits mostly to "d" and "b" mostly to "c" In this case 
communication between "b" and "c" suffers degradation 
in case that peak traffic coincide. 
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Land bridge gets its name from the fact that it is 
a luxury. It spans across connected modules. 
The idea is simple. When V2 sends message to 
Dl it gets to one side of the bridge. This side 
analyzes the destination address and by some 
magic (explained later) decides to short-cut the 
path. The message re-appears at the other end of 
the bridge and gets fast to Dl . By same magic, 
message fromVl to D2 get bypassed also, 
message fromVl to Dl is treated directly. 




F'3 ■ /fr 



Enumeration is started by "Anchor" 
which assigns address=l to itself, results 
of enumeration are labels I to 7. land 
bridge gets two addresses , as if it were not 
one module, there is "near" end, that got 
enumeration label "3", and the "far" end 
marked 6. 
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msgl and msg2 arrive at the same time, 
the bridge end must make a decision 
which message to forward first. 

It can be shown that unwise decision can 
lead to freezout, deadlock and option price 
dropping to 5$. 

Therefore MSG2 gets the priority. 
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/7 



fifo 
f logic 




dmsg 



dmsg 



is* i 



ft 



umsg 



fifo 
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Bridge takes responcibility for strays, 
but only at the "far" end. During 
enumeration, bridge is "polarized" to 
have near and far end. Near is the end 
first struck by enumeration message. 



So we have exactly one enforcer for each 
ring. 



3 near 




In land bridge ring, the situation is trickier. If V2 
send message to address==5. The land bridge 
divert at 1 1/far end. it will re-appear at 3 and 
start cycling forever. 

We have to define an algorithm that will take 
care of all cases. 

Luckily there is a way. 

Land Bridge deals only with messages arriving 
at the far end and being diverted. It marks and 
monitors only those. Messages arriving at near 
end, keep their markings. Messages at fdar end 
going through, are left alone. 



1 
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Rg- 
73 




msg_type 
msg_addr 

msg_data(63:8) not used for scan 
msg_data(7:0) , used for scan chains 
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< 

i 

E 



type. 



addr, 



64 data^/El* 



ok 



scan_test 



reset 



elk. 



p. 



F19 



26 



u 



elk 



type 



ok 



idle j msgA 



X ms g B 



\idk 



during the first clock, OK remains active, when type 
is of msgA. It means that on the next clock, 
memberA may send new message. memberA uses 
this ok to send msgB on the next clock. msgB gets 
stuck for a clock because OK goes inactive. It goes 
inactive because the fifo in memberB is full. One 
clock later, the Fifo has a free entry, so OK returns to 
1 and type returns to idle next clock, return to idle 
could also be change to next message, if there was 
one. 



imsg iok omsg 00k 
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The incoming messages are examined first 
to see if it is supervisor or work/program. 
Work/program messages have address field. 
We check if it is our address. Since we know 
that our address is aligned to our power of 2, 
The address mask (named split mask) 
causes only certain number if upper bits to 
be compared. The lower part of the address 
is passed inside as internal address. The 
upper bits are compared against self-address 
register. This register gets its value during ^ , 
enumeration protocol. The lower part of this f comparator J 
register is always masked,. Hopefully 1 
synthesis will delete the unused bits 1 

implementation. ours/through part of the address 



incoming 
address 

address 
split mask 




that enters the member 
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Member 



RifJL* Rif* Rif_o_* moduleJ d 



RIF 



Address Space = 7 



Activation register 



R<5- "so 



30O 
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Member 




3. 3 2 
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Member 



RifJL* Rif_* Rif_o_* mod uIe_id 



RIF 



Address Space = 7 



Activation register 



300 
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the second land bridge solves most traffic problems, but 
adds 4 clocks in the overall ring length. This is not a big 
problem because no message should travel the whole 
perimiter. 




The Utopia interface is 
forced into mode that 
communicates in 
messages, not cells. We 
using the I/O and maybe 
some of the logic. 



3<o 



Application 

Specific 
Accelerators 

CRC 
Encryption 
Table Lookup 
Hashing 



Internal Memory 



3^2 



Fast, Unified, Multi-port 



Doorbcllj 



Vobla 

Network Processor 



Peripheral Expansion Arc?^ 

Enet, ATM. Uart, USB, Serials 



System 
Expansion 
Area 

CPU (PP) 

DMA 
Smart FIFO 
Ext. mem I/F 



5So 
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FTU 



Internal Memory 



r 



LSU 



Program Setjuenccr 

PSU 



Vobla Core 



PreJojd&tlimip 

PBU 



Register Fik 



Arilhmctrc 

DALU 



31 a 



Agcni l/F 

AGI 1-3*2- 



<\jcc Debug 



Dixirbcll 



DMA Agent 



Agent Bus 



31 



ixccrnj) Wwld L'F 



Vobla Compounds 
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Othat 




3? 



Mr ram data out 



32 registers of 
32 bits per ias.k^ 

A set of uidi-catnuis 
per task, vk hich 
control task execution ' 

An interface to 
adjaccsu resource* 

fait mcrmory accessed 
by laad/stofe 
irairuetion* 



(.jeneral 
purpose 
registers 



Special 
puipose 
twisters 



L>i>or belts 



Ageal 
interface 



P eotfifiguraiion 
ivgi iters 



Internal 
mcnwxfy 



External 
memory 



per task and 
global register* 



Initialized by 
the PP 



Big area accessed 
via a DMA interface 
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Rf register: 




s - sticky bit 

eq - equal) 1 zero 

ft - teas theivriegative 

gt - grealer then/positive 

c- carry 

mb - reflection of ibe RAM multi-reader busy indication. 



^*>22222222211llll»lll 
l<l»iJ7654i2 1©'»)*"«ft5 j i ; j o 



{spr uvtcK - »>) 



Wo 



IO»9?6»*J21Q93 -, *3 4 } 2 ) ft 



42- 



LMXJK ' 

wo ^ 



ill 



MJ 



J>2222222222l»Jll I I I J I <» * 7 fc 5 
|0«*7*>41210*a765 4 1 1 I 0 



tspr index - 2 » 



1111 



ilil 




hi 



memh-xstr 
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Frame structure 
of an example 
task type 



r27. 



t3 



common 
task data 


task 

fragment 1 
data 






task fragment 
2 data 


task 

fragment 3 
data 


data of ail lcvc!2 functions 


level 1 fl 
data 


level 1 f2 
data 


levell (3 
data 







size oflcvclO 
frame part is 
different for 
each task 
type 

A sizcoflcvel2 
! frame part is 
* constant 

size of levell 
frame part is 
different per 
each task 
type 
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-Hip Flop 



S> - Logic 
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— « yield Indication 




I — "✓I — X [mfcssag j^out fl fo | 

1 |spUler nngif I 
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!^_jntcrface_addrcss[ 1 5 0 
mU*rfac«_data[3 1 .OJ 



type out[7:0](si2ek(LRC ) 

address out[23:0] 

data out[63:0] 
RC 

Ik 



(alsngo C RC) 
ok2dnvc 



V* 



reset 




I byte[Or pby te[1] I bvtem T Vbyteffl 



agent commam 
(AID- multiread *7T 



vobla entry 
in mult (reader 

increment fi rs t *.noop lasi 

a d ddr^"° n («P 2 > <«•"> <"P°> 
|op3) 



source addr in sram address of destenation number of bytes 
- - ~ ~ to transfer 

F/'a- Z~o 



F"j 
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Vobla 



agent interface 



main 
entry 



shadow 
entry 







output 
message 
encoder 


a. 


— 







type out[7:0j 

addrgss_ oul[23:0) 

data5uT[63:03 



reset 



, elk 



ok2drive 



options[9:0] 



RA 



agent command I 7™ 

opcode 

(AID«-messago_QOndor) 

(option[6] =0) 

mes&agQ_sender3 ^ 0 , op t,onsf 5:0] | raw_data[3 1 :0J 



A1D[4:0] 



RB/imm8 



raw_addrcss[23:0] 



t>pc[7:0] 



message data 



address_of_dcstcnation message type 



agent 

(AtD=me»sag< 
(option [6] =1) 

mesaage_»ender6i 



opcode 


options[9'0] 


RA 


AID[4.0] 


RB 



{ l,options[5 



raw_data[63:0J 



ra w_addrcss[23 :0J 



I00OOOOO 



message data 



addrcssofdestcnation message type 



6Z 



t^oi set mask 



doorbell 



di rPsctmask I 
d n*set mask 2 



d ri""set mask 3 



Vobla 



" igent interfc ce 
stall vobla 



token 
control 



id[5:0] 



request 
entries 
X2 



DMA 
context table 



if 



input 

message 
decoder 



output 

message 
encoder 



type_in[7:0] 



writ? inierfoce_address[23;0] 

write inter face_data[3 1 :0] 
n ^p"astT addrcss[7 :0J 

type out[7:0] 

addrgss_ out[23.0] 

^ata^uT[63:0] 

"5ok2drivc 



"A 
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agent command 
(AID»dma agent] 



dma request} M 
entry 

modify urgcntdir autoscnd long 
address (OP2) (Op|>set ack address 
(OP9) (<>P0> (OP3)<OPIO) 



dram address 



sram address 



number of" bytes 
to transfer or 
address modifyer 



last in transfer 



U st_in_ 



calculate c rc 



multireade ' 



rans < 
frag? 



Vobla 



ultireacjlataj 3 1 :0] 
)] 

cader 
igent injerfafce 



stall vobla 
= 



register 




input 


file 




buffer 



J tx_data_rr 



random 
number 
generato 
(TRD) 



CRC data 

X 



, on dmand 



URC 

(3,10,32 

machine; 



checksum 
machine 



bip16 
machine 



M c 



elk 



reset 



agent command 
(AID=CRC agent) 



opcode 


options[9;0] 


RA 


AID[4:0] j 


I - 1 




I 



I 



O DATA[63:32] DATA [3 1:0] 



RESIDUE 



CRC type data size generateoperation overwrite 
check mode residue 
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Vobla 



igent interfa ce 



control 
register 



counter 



I 




pre scaler 4- 



div 

by: 



time stamp 
register 



elk 



reset 



agent command | opcode 
(AIDMJmer agent) 




Pi 
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Ft* • 



v 



mask control 




priority logic 


logic 







agent comma 
(AID=doorboll 



mdj op 



opcode 



options[9:0] 




RA 



j AID[4 



:0] 



RB/imm8 



1 2- 



task 
istcr mask 



set/clear clear clear 
global request mask 
(OP2) (OPl) (QPO) 



v* 



{0 T O v O,0,0,mask_bit_inilexl2.0] } write musk 
or 

(0,0,0,0, 1 ,req_bit index[2:0] ) write request 
or 

{ 1 ,0,0,0,0 t count_value[2.0] } write counter 
or 

{0,1.0,0,0,0,0,0) write TGMR 
or 

{1,1 ,0 T 0,0,0,0,urgcnt_valuc } write urgent 



yo 
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J Functional unit 




External ring interface 
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Trajan memory 
interface 



Trajan FIFO_RD FlFO^Rp 
input 



Trajan FlFO_WR 
input 



Host address 
generator 



JL 



Target address 
generator 



Synchronizer 



Synchronizer 



Synchronizer 



Interrupt 



FPGA 



Interface to 
external 
device 
(device 
specific) 
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urgent viscode 
send ahead address 
threshold to transmit 
threshold to urgent 



doorbell 
free entry count 
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ring control 




ring c 
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